Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12
Conversation
dklawren
commented
Mar 16, 2026
- extract_reviewers: filters out any review where user is null, preserving empty-body reviews (e.g. approve without comment)
- extract_comments: filters out any comment where user is null or body is empty
- transform_data: defensive (review.get("user") or {}) so a null user won't raise AttributeError even if it somehow reaches the transform
…body and user object is null. The ETL script needs to handle these better instead of crashing
There was a problem hiding this comment.
Pull request overview
Adds defensive handling for GitHub PR reviews/comments where user is null or body is empty, preventing ETL script crashes on older repositories with malformed data.
Changes:
extract_reviewers: Filters out reviews with nulluserand logs skipped countextract_comments: Filters out comments with nulluseror emptybodyand logs skipped counttransform_data: Uses(review.get("user") or {})pattern to safely handle null user objects
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
shtrom
left a comment
There was a problem hiding this comment.
I don't think the error we saw will be addressed by this fix.
Is there a way you could run one iteration of the loop against a given PR locally? That would make it easier to reproduce topical issues.
shtrom
left a comment
There was a problem hiding this comment.
Actually, I didn't consider all options.
I thought review== so that review.get raises that error
however I now think you may be right about review['user']==None, in which case your fix is correct.
In any case, I think that call for some local one-shot runability to confirm (: